This article explores sequencing coverage fundamentals. Uncover key concepts and discover how highly accurate long-read sequencing provides a comprehensive view of the genome, at any coverage level.
What is sequencing coverage?
Genomics professionals use the terms “sequencing coverage” or “sequencing depth” to describe the number of unique sequencing reads that align to a region in a reference genome or de novo assembly.
A 30x human genome means that the reads align to any given region of the reference about 30 times, on average. In practical terms, the higher the sequencing depth, the more times the genome is read, resulting in a more accurate and reliable information.
Why is sequencing coverage important?
Sequencing coverage is important in genomics because more coverage gives researchers greater statistical confidence that their results, and the conclusions that are drawn from them, are correct.
Increased coverage is important for scientists to be assured that what was observed was not a fluke or random error but rather an actual attribute of the biological sample.
In science, having statistical confidence in the outcome of an experiment is very important. If you flipped a coin three times in a row there is a decent chance that it might land on the same side two out of three times. If you stopped there, you might conclude that such coins will land on one side more often than the other (66% of the time to be exact). But a sample of 3 is small, what if the coin flips you observed were merely the result of chance? If you performed your coin flips 30 times or 100 times, it is much more likely that you will find that the results fall closer to a 50/50 split as to which side the coin lands on.
Are genomes of equal sequencing coverage of equal scientific value?
Genomes with equal sequencing coverage are not necessarily of equal scientific value.
Many factors influence the explanatory power of a genome alignment or assembly. However, in addition to the underlying factors (e.g., assumptions, sample quality, experimental design, etc.) that can affect the biological value of a genome, the uniformity of coverage and the accuracy of the individual reads can greatly enhance the scientific value of one genome over another. An excellent example of this comes from a technology comparison study where the authors found that for the de novo assembly of the Saccharomyces cerevisiae genome, 20x coverage with highly accurate long-read PacBio HiFi data exceeded the utility of 20x (and in fact even 80x) coverage using nanopore sequencing1.
What coverage uniformity is, and why it is important
Coverage uniformity tells us how evenly distributed individual reads are across the genome or region of interest.
Two genomes could be sequenced to an equal level of coverage (e.g., 30x) but the first could have low uniformity (with some genes that are not covered at all and others that are covered 60 times); while the second could have highly uniform coverage with every gene or region covered 25 to 35 times. At face value these are both 30x genomes. However, the first is lower quality –with gaps in some areas and excellent coverage in others – while the second provides respectable confidence throughout –making it more useful for interpreting biology across the whole genome.
How much sequencing coverage is necessary?
The right level of sequencing coverage for a study can vary widely depending on the goals of a project and how the results may be applied. Factors to consider include but are not limited to:
-
The type of sequencing technology being used.
- The ploidy of the genome.
- The complexity or rarity of the variant or attribute you wish to study.
- Sample quality/level of degradation.
- The desired level of statistical confidence/power.
- Requirements set by peer-reviewed journals or data repositories.
Rethink what can be achieved at any level of coverage
Now and again new technologies arrive on the scene in ways that redefine the way we approach science. The current standard for whole genome sequencing coverage was established at a time when short-read sequencing by synthesis (SBS) technology was still new2. As a result, the 30x sequencing coverage benchmark in many ways reflects the capabilities and limitations of that technology. Today, the technological landscape of DNA sequencing looks quite different.
For example, the superior variant calling performance of PacBio HiFi is highlighted in a comparison against recent Illumina and ONT datasets while titrating at coverage levels from 8x to 40x. All three sequencing technologies show similar SNV performance but differentiate more significantly on indels with HiFi significantly better than ONT (60x) and gaining on Illumina (35x). HiFi sequencing demonstrates industry-leading SV performance where 20x HiFi surpasses 60x ONT and far outperforms Illumina by 39%.
Titrating HiFi coverage demonstrates that a 20× HiFi genome achieves over 99% of the 30× F1 score for SNVs and SVs and over 98% of the 30× F1 score for indels. The value of a 20x HiFi genome is further supported by an independent study showing that 20× HiFi coverage recalls 96.2% of the difficult, clinically-relevant germline variants identified at 30×. Because an average depth of 20× identifies nearly all the variation at 30×, and more than other technologies, we recommend a 20× HiFi genome to optimize the accuracy and cost for most reference-based applications.
At present, PacBio long-read WGS can provide information that is practically unattainable with SBS short-read sequencers at any level of sequencing coverage7. Structural variation, native 5mC epigenetic calling, haplotype phasing, and accurate, uniform coverage of the entire genome, including what used to be called “dark regions” (such as large repeat expansions, GC rich areas, centromeric regions and more) – can now be seen and are part and parcel of a standard PacBio HiFi long-read sequencing run on both Revio and Vega systems.
In the era of high-throughput long-read sequencing, there is still some debate on what the new baseline should be for sequencing coverage in human genomics. Nevertheless, it is evident that the old standard is undergoing significant change, and we are so excited to see where it takes us.
Are you interested in finding out why PacBio long-read sequencing requires less coverage than other technologies? Check out this article on long-read sequencing and our Application brief.
Rich genomic information at lower coverage seems great, but what does it cost?
At about 500 USD*† per 20x human genome on the Revio system with SPRQ chemistry, accessing the full complement of genomic information to make a truly “whole” whole genome is now more affordable and information rich than ever. This is especially true when considering the time and money saved by skipping the additional experiments required to achieve lesser insights with short-read SBS technology. If you are interested in a PacBio long-read human genome, the Revio system is optimized to produce two 20x HiFi human whole genomes per SMRT Cell with a ~24-hour turnaround time, without batching –making it a $500 USD* for each 20x HiFi whole genome ripe for enabling novel discoveries.
These advances offer exciting possibilities, and raise one crucial question – what groundbreaking discoveries will you make?
Are you interested in investing in a PacBio system or sequencing through a core facility or service provider? Would you like to pinpoint the correct level of coverage to achieve your project goals with PacBio sequencing?
Speak with a PacBio scientist to explore options.
References
- Xue Zhang, Chen-Guang Liu, Shi-Hui Yang, Xia Wang, Feng-Wu Bai, Zhuo Wang, “Benchmarking of long-read sequencing, assemblers and polishers for yeast genome”, Briefings in Bioinformatics, Volume 23, Issue 3, May 2022, bbac146, https://doi.org/10.1093/bib/bbac146
- Bentley, David R et al. “Accurate whole human genome sequencing using reversible terminator chemistry.” Nature 456,7218 (2008): 53-9. doi:10.1038/nature07517
- Kong, Sek Won et al. “Measuring coverage and accuracy of whole-exome sequencing in clinical context.” Genetics in medicine: official journal of the American College of Medical Genetics 20,12 (2018): 1617-1626. doi:10.1038/gim.2018.51
- Sims, D., Sudbery, I., Ilott, N. et al.Sequencing depth and coverage: key considerations in genomic analyses. Nat Rev Genet 15, 121–132 (2014). https://doi.org/10.1038/nrg3642
- “Evaluating Somatic Variant Calling in Tumor/Normal Studies” https://www.illumina.com/content/dam/illumina-marketing/documents/products/whitepapers/whitepaper_wgs_tn_somatic_variant_calling.pdf
- Manja Meggendorfer et al. “Analytical demands to use whole-genome sequencing in precision oncology” Seminars in Cancer Biology, Vol 84, 2022, 16-22, https://doi.org/10.1016/j.semcancer.2021.06.009.
- De Coster, W., Weissensteiner, M.H. & Sedlazeck, F.J. Towards population-scale long-read sequencing. Nat Rev Genet 22, 572–587 (2021). https://doi.org/10.1038/s41576-021-00367-3
- William T. Harvey et al. “Whole-genome long-read sequencing downsampling and its effect on variant calling precision and recall” Preprint bioRxiv 2023.05.04.539448; doi: https://doi.org/10.1101/2023.05.04.539448
* Study design, sample type, and level of multiplexing may affect the number of SMRT Cells required. Costs may vary by region. Pricing includes library and sequencing reagents run on the Revio system and does not include instrument amortization or other reagents. Pricing information is current as of July, 2025.
† Requires multiplexing 2 human genome libraries (20x) per SMRT Cell to achieve price estimate.